http2: reduce per-request overhead on the server path#64265
Open
mcollina wants to merge 6 commits into
Open
Conversation
Cut several sources of per-stream/per-request overhead on the hot path: - Track 'priority'/'frameError' stream listeners by overriding the EventEmitter methods on Http2Stream instead of subscribing to 'newListener'/'removeListener', which made every listener add and remove on every stream emit an extra tracking event. - Replace the per-call SafeSet and sensitive-header mapping in buildNgHeaderString with a lazily allocated array and an empty-array fast path, and skip the HTTP token regex and connection-specific header checks for well-known single-value header names. - Replace per-call closures with shared named handlers in onStreamClose, afterShutdown and Http2Stream._destroy. - Skip the pendingStreams Set add/delete for streams that are created with their native handle already available (all server streams). - Hoist the per-request onStreamTimeout closure factories in the compat layer to module-level handlers, and avoid a once() wrapper allocation per server stream. h2load, 1 KiB response payload, -c 4 -m 100, mean of 6 alternating runs: core API 60.2k -> 69.3k req/s (+15%), compat API 43.6k -> 46.2k req/s (+5.9%). Signed-off-by: Matteo Collina <hello@matteocollina.com>
The compat layer always responded with waitForTrailers set, so every response paid for a wantTrailers C++ -> JS callback, an empty sendTrailers() submission scheduled through setImmediate(), and an extra empty DATA frame on the wire, even though the vast majority of responses never register any trailers. When the headers are flushed as part of response.end() and no trailers have been registered, there is no further opportunity to add trailers, so waitForTrailers can be skipped altogether. Headers flushed early (writeHead, write, flushHeaders) keep the previous behavior so trailers can still be added while streaming. Trailers added after response.end() are now silently dropped, matching the HTTP/1 response.addTrailers() semantics. Also reuse a shared options object for Http2ServerRequest instances created without explicit options. h2load, 1 KiB response payload, -c 4 -m 100, mean of 6 alternating runs: compat API 43.1k -> 49.9k req/s (+15.7% cumulative vs main). Signed-off-by: Matteo Collina <hello@matteocollina.com>
Collaborator
|
Review requested:
|
Every _write()/_writev() on an Http2Stream allocated four closures and an anonymous nextTick callback to coordinate the write callback with the end-of-stream check. Since the stream machinery dispatches at most one write at a time, that coordination state can live on the stream's kState object instead, with shared named functions for the end check and completion logic. When trailers are pending the writable side cannot be shut down early anyway, so the end-of-stream check tick is now skipped entirely for those writes. Also pre-initialize the kState fields that used to be added dynamically (shutdownWritableCalled, fd) so hot-path stores no longer transition the object shape. h2load, 1 KiB response payload, -c 4 -m 100, mean of 6 alternating runs vs main: core API 61.0k -> 70.7k req/s (+15.9% cumulative), compat API 43.7k -> 50.4k req/s (+15.3% cumulative). Signed-off-by: Matteo Collina <hello@matteocollina.com>
When the compat layer flushes response headers before the response is ended (writeHead(), write(), flushHeaders()), it must keep waitForTrailers so that trailers can still be added while streaming. As a result, every such response paid for a wantTrailers C++ -> JS callback, an empty sendTrailers() with its setImmediate(), and a trailers() call back into C++, even though most responses never register any trailers. Introduce STREAM_OPTION_AUTO_EMPTY_TRAILERS: when set and no trailers have been handed to the native side by the time the final DATA frame is sent, the stream is finished directly in C++ with the same empty DATA frame carrying END_STREAM that the JS path would have produced, without calling into JS at all. The compat layer enables this mode whenever it responds with waitForTrailers and no trailers registered yet; a later setTrailer() call flips the stream back to JS-managed trailers through a new disableAutoTrailers() binding, so streaming trailers keep working unchanged. The wire format is identical in all cases. h2load -c 4 -m 100, 1 KiB payload, mean of 8 alternating runs against the previous commit: compat writeHead()+end() 47.8k -> 50.2k req/s (+5.0%); multi-write streaming responses +1%. Signed-off-by: Matteo Collina <hello@matteocollina.com>
Two per-request scheduling eliminations: - The end-of-stream check that lets the final DATA frame carry the END_STREAM flag was scheduled with process.nextTick() on every write. When the write is dispatched from inside end() - the common case of end(chunk) - the check can instead run synchronously once end() returns and the writable state has settled. An end() override marks the stream while the base method runs, and [kWriteGeneric] hands the check back to it instead of scheduling a tick. Writes not tied to end() keep the nextTick behavior. - Every stream destruction scheduled a setImmediate() to ask the session to clean itself up, but Http2Session[kMaybeDestroy] is a no-op unless the session is closed and has no remaining streams. Gate the setImmediate() on that condition: session.close() runs its own check, and the native side notifies again through ongracefulclosecomplete once pending data is flushed. The wire format is unchanged (verified byte-identical h2load traffic), and the END_STREAM merge is preserved. h2load -c 4 -m 100, 1 KiB payload, alternating runs vs the previous commit: consistently around +1% (within run-to-run noise on any single set, positive across 42 paired samples). Signed-off-by: Matteo Collina <hello@matteocollina.com>
respond() copied the user-provided options object on every call just so it could normalize and locally flip options.endStream, and prepareResponseHeadersObject() then looked the :status and date fields up again on the dictionary-mode null-prototype headers copy it had just built. Use a local variable for endStream and pick up :status/date while copying the headers instead. No measurable throughput change on its own; this removes an object clone and several dictionary-mode property lookups per response. Signed-off-by: Matteo Collina <hello@matteocollina.com>
metcoder95
approved these changes
Jul 3, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR cuts a number of per-request/per-stream costs on the HTTP/2 server hot path, in two commits.
1.
http2: reduce per-request allocations'priority'/'frameError'stream listeners by overriding theEventEmittermethods onHttp2Streaminstead of subscribing to'newListener'/'removeListener'. The previous approach made every listener add/remove on every stream emit an extra tracking event (the compat layer alone adds 11 listeners per request).buildNgHeaderString: replace the per-callSafeSetwith a lazily allocated array, skip the sensitive-headersmap()when there are none (the common case), and skip the HTTP-token regex plus connection-specific-header checks for well-known single-value header names — they are all valid tokens and none of them is connection-specific.onStreamClose(natural close path),afterShutdownandHttp2Stream._destroy.pendingStreamsSet add/delete for streams created with their native handle already available (all server streams).onStreamTimeoutclosure factories in the compat layer, and avoid aonce()wrapper allocation per server stream.2.
http2: skip trailers round trip for compat responsesThe compat layer always responded with
waitForTrailersset, so every response paid for awantTrailersC++ → JS callback, an emptysendTrailers()submission scheduled throughsetImmediate(), and an extra empty DATA frame on the wire — even though the vast majority of responses never register trailers.When the headers are flushed as part of
response.end()and no trailers have been registered, there is no further opportunity to add trailers, sowaitForTrailersis now skipped. Headers flushed early (writeHead(),write(),flushHeaders()) keep the previous behavior, so trailers can still be added while streaming.Behavior note for reviewers: trailers added after
response.end()are now silently dropped. This matches HTTP/1response.addTrailers()semantics (docs updated accordingly).3.
http2: avoid per-write closures in kWriteGenericEvery
_write()/_writev()allocated four closures plus an anonymousnextTickcallback to coordinate the write callback with the end-of-stream check. Since the stream machinery dispatches at most one write at a time, that state now lives on the stream'skStateobject with shared named functions. When trailers are pending, the end-of-stream check tick is skipped entirely (the writable side cannot be shut down early anyway). Also pre-initializes the dynamically-addedkStatefields (shutdownWritableCalled,fd) so hot-path stores no longer transition the object shape.4.
http2: finish empty trailers natively for compat streamsCompat responses that flush headers before
end()(writeHead()/write()/flushHeaders()) must keepwaitForTrailers, so they paid awantTrailersC++ → JS callback, an emptysendTrailers()+setImmediate(), and atrailers()call back into C++ on every response. A new internalSTREAM_OPTION_AUTO_EMPTY_TRAILERSlets C++ finish the stream itself (same empty DATA + END_STREAM frame, identical wire format) when JS never registered trailers; a latersetTrailer()flips the stream back to JS-managed trailers via a newdisableAutoTrailers()binding, so streaming trailers work unchanged (regression test added). CompatwriteHead()+end(): +5.0% vs the previous commit (47.8k → 50.2k req/s, 8 alternating runs); multi-write streaming ~+1%.5.
http2: reduce scheduled callbacks per requestThe end-of-stream check (which merges END_STREAM into the final DATA frame) was a
process.nextTick()on every write; when the write is dispatched from insideend()— the commonend(chunk)case — anend()override now runs the check synchronously after the base method returns. And thesetImmediate()scheduled on every stream destruction to pokeHttp2Session[kMaybeDestroy]is now gated on the only condition where it isn't a no-op (session closed, no remaining streams);session.close()and the nativeongracefulclosecompletenotification cover the other paths. Wire format verified byte-identical. Consistently ~+1% across 42 paired samples (within single-run noise).6.
http2: avoid copying the options in respond()Drops the per-response
{ ...options }clone (respond only reads the options now) and picks up:status/datewhile copying the response headers instead of re-reading them from the dictionary-mode copy. Throughput-neutral on its own; removes an object clone and several dictionary lookups per response.Negative results from the megamorphic-IC investigation (for the record)
--log-ictracing under load shows the dominant megamorphic sites are in the events machinery (_events/_eventsCountloads andevents[type]keyed loads across heterogeneous emitter shapes) — a node-wide property ofEventEmitter, not addressable from http2. The header-object keyed stores/loads (toHeaderObject, header copies) are inherently megamorphic: a single keyed-store site writing several different keys always goes megamorphic regardless of repeating shapes. ReplacingObjectKeys()+ keyed loads withfor-ininbuildNgHeaderStringwas tried and regressed header-heavy workloads (−1.7% at nheaders=1000, 99.9% confidence) — large null-prototype copies are dictionary-mode, where for-in has no enum-cache fast path — so it was reverted.Benchmarks
h2load (
-c 4 -m 100, 1 KiB payload, mean of 6 alternating runs):stream.respond+end)res.setHeader+end)benchmark/compare.js(10 runs):(
compat.js/write.js/simple.jsstream a file fromfsper request, so they are dominated by file streaming and mostly insensitive to per-request overhead; no regressions.)